The Emotional Faces of the Public Opinion¶

The Problem¶

  • Due to the design of attentional-based algorithms, most social media users experience echo chambers

  • This minimizes the ability to understand a complete picture of public opinion(and narrows individual opinion)

  • Is there a way to quickly access a different side to the story?

The Aim¶

  • This project's aim is to input a word/trend/topic or hashtag, and a sentiment. The output is a summarised understanding of the key points. Essentially like the google search of public opinion, hopefully to be used for people to access a greater scope of opinions quickly.

  • Twitter is the home of contemporary public opinion, and therefore the perfect place to start

  • Let's look at some data to get a better understanding

Twitter Data¶

Authentification keys¶

In [1]:
import tweepy
from tweepy import OAuthHandler
import pandas as pd


print("You got this!")



auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit=True)



# count = 1
You got this!

Let's see a live example

Search/queries¶

Let's use a simple example of Licorice Pizza, an award nominated but highly controversial movie recently released.

In [ ]:
tweets = []

for tweet in tweepy.Cursor(api.search_tweets, q = "#LicoricePizza", count=10, since='2022-01-28', lang = "en").items(200):

#     print(count)
#     count += 1

    try: 
        data = [tweet.created_at, tweet.id, tweet.text, tweet.retweet_count, tweet.favorite_count, tweet.lang]
        data = tuple(data)
        tweets.append(data)

    except tweepy.TweepError as e:
        print(e.reason)
        continue

    except StopIteration:
        break

# df = pd.DataFrame(tweets, columns = ['created_at','tweet_id', 'tweet_text', "retweet_count", "favorite_count", "lang"])

# """Add the path to the folder you want to save the CSV file in as well as what you want the CSV file to be named inside the single quotations"""
# df.to_csv(path_or_buf = '/Users/caselyhayford/Desktop/Twitter Experiments/Tweets.csv/', index=False) 

Just some of the query possibilities:

Screenshot%202022-02-19%20at%2014.04.22.png

DataFrame Work¶

In [7]:
df = pd.DataFrame(tweets, columns = ['created_at','tweet_id', 'tweet_text', "retweet_count", "favorite_count", "lang"])
In [3]:
df.shape
Out[3]:
(200, 6)
In [9]:
df.head(10)
Out[9]:
created_at tweet_id tweet_text retweet_count favorite_count lang
0 2022-02-19 11:20:56+00:00 1494995410510823424 A 15-year-old falling in love with a 25-year-o... 0 0 en
1 2022-02-19 11:20:47+00:00 1494995374364405760 "Don't be creepy," says the 25 year old to a 1... 0 0 en
2 2022-02-19 11:11:40+00:00 1494993078721159169 RT @universaluk: The nominations are in. #Lico... 2 0 en
3 2022-02-19 11:08:21+00:00 1494992245241442304 #LicoricePizza is too good 0 0 en
4 2022-02-19 10:59:34+00:00 1494990035552112642 RT @cineastmemes: Finally it's now available i... 1 0 en
5 2022-02-19 10:34:34+00:00 1494983743538630658 now watching #LicoricePizza FINALLY!!!! https:... 0 1 en
6 2022-02-19 10:08:17+00:00 1494977127514447874 #LicoricePizza is now on vod I repeat #Licoric... 0 0 en
7 2022-02-19 10:02:28+00:00 1494975663542915075 Finally it's now available in our channel the ... 1 2 en
8 2022-02-19 10:01:39+00:00 1494975461515923460 watching #LicoricePizza and there’s an actual ... 0 0 en
9 2022-02-19 09:57:46+00:00 1494974480262729729 Finallllllly #LicoricePizza 😌😌 https://t.co/Dn... 0 0 en

Example of Sentiment opinions¶

In [6]:
print(df["tweet_text"][0])
A 15-year-old falling in love with a 25-year-old should not be normalized or romanticized regardless of gender. #LicoricePizza

A negative sentiment highlighting the age gap issue

In [8]:
print(df["tweet_text"][199])
What a fantastic soundtrack (lovely cover too)! Discovered new favourites thanks to Paul Thomas Anderson. Can't wai… https://t.co/mguHg4buyL
In [10]:
print(df["tweet_text"][3])
#LicoricePizza is too good

Some positive tweets, talking about quality and the amazing soundtrack

Approach and Possible Methodology¶

Approximate Steps:

  1. Scrape the corpus (dateframe) from twitter

  2. Clean the text data (NLTK or spaCy)

  3. Binary Classification Sentiment Analysis to split the data into positive, negative (could include neutral). Here we can either train our own model (risky as it is hard to generalize to any tweet without training huge numbers), or use transfer learning using a pre-trained model. Say Hello to Hugging Face!

  • from hugging face, there is a powerful sentiment model based on tweets called roBERTa, trained on 58 million tweets. Let's have a go.
  1. Now we have our corpus divided by sentiment. Using the new subset corpus of the appropriate sentiment (one selected by the user), we will now use doc2vec from gensim (same as word2vec, but used for whole documents (tweets), for the embedding of whole tweets. We can then perform LSTM (or CNN) DL to create clusters of the tweets based on patterns.

  2. We can visualize these clusters as output while supplying the central tweets of these clusters, showing the core positions of the sentiment based tweets.

Or WordCloud ??!

Screenshot%202022-02-19%20at%2013.31.33.png

Advanced task¶

For each cluster (topic) of tweets, we generate a summary tweet, written by a GAN model.

Demo Day, Final Product¶

  • On demo day, the goal is for anybody in the audience to suggest a topic and the sentiment they want (or don't want), and for the output (cluster visualisation and core tweets (possibly generated summaries)) to be the main points within public opinion of their request. The product idea is to quickly be able to see another side of a story or topic based on public opinion.

Thank You for listening !!!¶